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DICHOTOliOnS  SIP^IMENTS 

By 

RtisouH  H . Bradlt 
1.  Introduction  and  Saaaary. 

It  nay  frfaquently  happen  that  a reaeareher,  wishing  to  decide  which 
one  of  a set  of  altematlres  to  accept,  finds  that  there  are  seweral 
experlnents  ara liable  to  hin  which  he  ni^t  perform  to  guide  hln  in 
reaching  his  decision.  Thus,  he  Is  faced  with  making  a prelininary  decision 
as  to  which  experiment  or  experiments  he  Is  to  perfom.  If  he  admits  the 
possibllltj  of  performing  more  than  one  experiment,  then  the  questions 
of  how  many,  which  ones,  and  in  what  order,  arise.  It  Is  such  questions 
that  come  under  the  heading  of  comparison  and  design  of  experiments. 

In  Its  most  general  formulation,  a sample  space,  is  an  ordered 
quadruple,  (Z,t^,Q,P),  where  Z is  an  arbitrary  set,  Is  a Borel  field 
of  subsets  of  Z,  O.  Is  an  arbitrary  set,  and  P is  a function  defined 
on  t^xO  with  the  property  that  for  each  to€-0,  the  restriction 
of  P to  «?x  (ca>).  Is  a probability  measure  on  '0.  In  this  setting  an 
experiment  Is  a suh-Borel  field  of  0,  If  ^ Is  a Borel.  field  of  subsets 
of  a set  V and  T is  a ^ measurable  function  from  Z to  W,  then  T Is  a 
random  rarlable  and 

0^  ■ : for  some  E&(?  , B ■ T(s)£E^j 

Is  a sub-Borel  field  of  0.  0^  Is  called  the  experiment  iissociated 

with  the  random  wariable  T.  Keeping  in  mind  that  many  random  rarlables 
may  be  associated  with  the  same  experiment,  and  therefore  to  rlew  an 


experineot  ns  a sab-Borel  field  is  the  sore  basic  approach,  no  confusion 
will  result  in  this  paper  from  identifying  random  yariable  with  experiment. 

Since  the  random  rariables  dealt  with  in  this  paper  are  all  reel 
Talued,  to  say  that  un  experiment  is  available  to  the  researcher  is  to  say 
that  there  is  a real  random  variable  which  he  can  observe  and  whose  distri- 
bution is  known  for  each  coc  n. 

While  much  of  the  general  theory  of  th.  design  problem  has  been 
developed,  e.g.,  by  Wald  Cl]  and  Hagwire  [2],  actual  solutions  of  particular 
problems,  especially  of  the  sequential  type,  have  not  been  obtained.  This 
paper  stems  from  work  towards  solving  the  design  probloa  for  particular 
eases.  Attention  is  restricted  to  dichotomous  experiments;  i.o.,  ilis 
assumed  to  contain  but  two  elements  which  will  be  called  hypothesej  and 
denoted  by  and  H2.  It  is  supposed  that  one  is  required  to  decide  which 
hypothesis  is  true  and  that  a loss  of  one  unit  is  suffered  if  the  false 
hypothesis  is  chosen  while  no  loss  occurs  if  the  true  one  is  chosen. 

Further,  ^ will  denote  the  fl  priori  probability  that  is  true  and  the 
criterion  to  be  used  in  comparing  e3q>eriments  will  be  the  Bayes  risks 
associated  with  the  various  experiments. 

In  Section  2 it  is  supposed  that  there  are  two  experiments,  i.e., 
random  variables,  X and  I available  and  that  but  one  experiment  is  allowed. 
Some  conditions  for  uniform  inequalities  between  the  Bayes  risk  associated 
with  I and  that  associated  with  T are  obtained.  Certain  relations  between 
the  Kullbach-Leibler  information  numbers  for  I and  for  T and  their  Bayes 
risks  are  shown.  lu  particular,  it  is  found  that  a necessary  condition 
that  one  random  variable  have  a Bayes  risk  uniformly  less  or  equal  that  of 


the  other  is  that  its  Kallbach-Lelbler  information  numbers  are  greater  or 
equal  those  for  the  other.  The  case  in  which  the  distributions  ere  normal 
is  dieeussed  in  some  detail  and  a few  remarks  are  addressed  to  the  matter 
of  Tiewlxig  the  Kul.lbaeh-Lelbler  information  numbers,  in  certain  special  cases, 
as  functions  of  that  transformation,  t,  such  that  the  dlstribatiou  of  t(Z) 
under  is  the  distribution  of  Z under  H2. 

Section  3 is  devoted  to  the  problem  of  designs  in  the  case  of  binomial 
distributions.  It  is  supposed  that  the  two  experiments  available,  Z and  I, 
are  Independent  and  of  equal  cost,  and  that  it  is  given  that  a total  of 
n experiments  is  to  be  performed.  Two  problems  are  discussed:  What  is 
the  best  division  of  the  n experiments  between  Z's  and  I's  if  one  is  to 
decide  this  natter  before  experimentation?  What  is  the  best  sequential 
design,  i.e.,  the  best  rule  prescribing,  as  a function  of  the  results  of 
the  preceding  experiments,  which  random  variable  to  ob?*erve  in  the  next 
experiment. 

In  Section  U,  instead  of  considering  the  performance  of  a fixed 
number  of  experiments,  the  experimentation  is  supposed  terminated  by  a 
particular  sequential  stopping  rule  and  one  is  interested  in  discovering 
sequential  designs  which  minimize  the  expected  number  of  experiments  that 
will  be  performed. 

In  the  final  seotioh,  u somewhat  different  purpose  of  experimen- 
tation is  introduced.  Again,  Z and  I are  two  real  random  variables  with 
known  distributions  undor  the  two  OTpotheses.  A total  of  n experiments  is 
allowed  and  a sequential  design,  telling  which  random  variable  to  observe 
at  each  step,  which  will  maximize  the  sum  of  the  n observations  is  sought. 
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The  design  whloh  requires,  nt  eaoh  step  that  play  vhleh  maxlalzes  the 
expected  value  of  the  next  observation  is  considered  in  particular,  for 
the  case  in  which  X and  I have  binomial  distributions  such  that  I under 
and  T under  have  the  same  distributions  and  X under  ^2  and  I under 
have  the  sane  distributions,  the  problem  is  known  as  the  'Two-armed 
Bandit' . Brief  outlines  of  two  methods  of  attack  on  the  question  of 
the  optimal  design  for  the  Two-armed  Bandit  are  given.  It  is  a conjecture 
of  Blackwell's  that  is  the  optimal  design.  By  both  methods  this 
conjecture  was  found  to  hold  true  for  small  values  of  n.  Eaoh,  however, 
appears  to  be  too  cumbersome  in  the  general  case  to  provide  a fhll 
proof, 

2.  Some  Belations  Between  Baves  Risks  and  the  Kullbaoh-Leibler 
lBf?VH^49n  Nittbera- 

2.1  General  Results.  Of  the  two  hypotheses,  and  H^,  let  be 
true  with  a priori  probability  and  ^ tme  with  a priori  probability 
1-  Suppose  that  it  is  required  to  decide  which  of  the  hypotheses  is 
true,  aujrfering  a loss  of  one  if  the  false  hypothesis  is  chosen  and  no 
loss  otherwise.  Further,  suppose  that  X and  I are  real  random  variables 
having  distribution  functions  Fj^  and  Gj^,  respectively,  under  hypothesis 
and  with  the  corresponding  densities  and  g^  with  respect  to  a 
ccamon  measure,  'I' , such  that  fj^>0  if  and  only  if  gj^>0.  An  observation 
either  of  X or  of  I is  allowed  to  assist  in  making  the  decision  as  to  the 
true  hypothesis . 

Of  course.  If  but  one  observation  were  allowed  and  one  were  interested 
only  In  comparing  X and  I for  one  particular  value  of  , the  prellmin4ry 
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decision  as  to  afhether  to  observe  X or  I,  l.e.>  the  design  problem,  reduces 
tbs  Bs7es  risk  against  ^ when  using  I,  Bv(C)9  and  that  when 
I Is  used,  and  using  tb.e  random  variable  corresponding  to  the 

smaller  risk.  Since  one  Is  not  Interested  In  such  a strongly  restricted 
comparison,  this  criterion  will  not  yield  a simple  solution,  unless 

for  all  Xf , or  Bj(  ) for  all  (0;^  in 

which  case  the  choice  between  I and  I fa  clear.  Furthermore,  any 
criterion  for  choosing  between  Z and  T should  agree  with  this  whenever 
one  risk  curve  lies  uniformly  on  or  below  the  other. 

Considering  the  statistical  games  based  on  Z and  on  I as  S-games 
([3I),  with  Sj  and  the  respective  sets  of  risk  vectors,  the  condition 
that  ^ equivalent  to  i.e.,  any  risk 

vector  attainable  using  T oan  also  be  attained  by  using  X.  Interest  in 
conditions  unde?  which  Is  farther  increased  in  view  of  results 

of  Hlaekwell's  [41  that  If  such  is  the  ease,  then  regardless  of  the 
number  of  actions  open  to  the  researcher  or  the  loss  ftmotlon  used,  the 
set  of  risk  vectors  attainable  with  X contains  that  attainable  with  X. 


Let  denote  that  B^('^  For  all  11^  . Throughout  the 

paper  It  will  be  found  very  convenient  to  consider  and  this  will 

i-t,. 

regularly  be  denoted  by  7^. 

Lemma  2.1.  Two  conditions,  each  necessary  and  aufflelent,  that 


^ 00  r < ■)  °^ 

(i)  Jmln(u->^,0)dE(u)j  - jj  mln(u-;^,0)d7(u)  ; 

00  ^ ^ a 

(ii)  J min(l--^  ,0)dG(u)|^“  jrj  mln(l- •^,0)dH(u)  ; 
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rrhere  E and  G are  the  o.d.f.'e  of  f2(x)/f^(x)  under  and  S,,,  respeetlTel7, 
snd  7 and  E are  the  ft^d^fr'a  of  g^(x)/g, (x)  under  K,  and  reepeotirely. 

Proof.  From  the  well  known  theory  of  Bayee  solutions  (see  [3], 

Chapter  6),  the  Bayes  risk  against  2^  using  X Is  given  hy 

(1)  f f,(x)dr(x)^(l-T;)  r f2(x)diV(x)  . 


With 


n 


.JTL. 


, this  oen  be  written  as 


fgrx) 


fl^^>( 

With  E(u)-  / f.(x)d4^(x), 

f2(x) 


f2(x) 


(3) 


xvv/ 

‘>1  - / udE(u)-i^  j dB(u) 


0 

OD 


With  F(tt)-  J f, 

fo(x)  ‘ 

F^(x)  ^ 

V'z;) 


I mltt(u-?|  ,0}dX(n)  . 


f.2(x)d4'’(x)  , 


(4) 


x(X’>  }i 

- J dF(n)- V / i dT(t.) 


00 


* J nin(l- 


- - ^ 3«  / \ 

fVJUS  \ lt* 


u 


t 


With  th8  analogous  eocpresslona  for  the  risk  associated  with  X,  the 


conclusion  la  immediate. 

Leama  2.2.  (i)  Bj(  ^ ) if  and  only  If 

"7 

where  is  the  probability,  under  that  in  follo»ing  the  Bayes 

procedure  against  ^ with  X,  will  be  chosen. 

(ii)  If  G(tt)/u— ^0  as  u— yo,  then  ^ 

r'^  r>^  ^ 

J j ■ U • 

o ^ ^ J o 

where  ia  the  probability  under  that  In  following  the  Bayes 

procedure  against  ^ with  X,  will  be  chosen. 


Proof,  from  Equation  (3)  in  the  proof  of  Leama  2.1, 


(1) 

‘J 

rL 

(ur>^)dE(u) 

Integrating  (l)  by  parts  yields 

St(^) 

1-^  -n  • -J  • 

0 

However, 

E(u)“  J f-(x)d4^(x)  - Hence, 

(3) 

J 

1 

o 


From  tha  similar  expression  Involving  R^,  conclusion  (i)  follows 
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A parallel  arguaent  from  equation  (4)  In  the  proof  of  Lemma  2.1 
yields  conclus'’«!i  notlnp  that  (1(c)  “ . 

Theorem  2.1.  Tro  eonditiona,  each  necessary  and  sufficient  that 
Bj*  are: 

(i)  ®®®®  •iistributions  under 

(ii)  ^2^^1  distributions  under  E2. 

Proof.  The  sufficiency  is  immediate  from  Lemma  2.1.  To  show  the 

necessity,  suppose  R-.’*Ry,  then  for  all  ^ ^ 0, 

00  CD 

(1)  r min(u- >1 ,0)dE(u)  = [' min(ur>7  ,0)dP(u) 

o *0 

Now,  for  any  a>0,  let  ^J^(u)-u  min(a-a,0)  and  let  7f^(n)  ■ -n  min(ti-(a  + ^ ),0), 

n-1,2,3,.,.  . Then 

00  CD 


J (0jj(u)*  ^^(u))dE(u)  - J (0j^(u)-- <5*j^(u))dF(u) 


for  all  n.  Hence ^ 


(3)  K(a) 


♦ j (l-n(u- 
a<  u<  a + ^ 


i-a))dE(u)  “ ?(a)-*-  / (l-n(ura)dy(u) 


a<  u;S^  a 


XiO^^jLii^  E(s}*  ^*«c  Ts^jLoo^ 

have  the  same  distribution  tmder  It  foXXows  immediateXy  that 

Oc^Jp)-  aj(^),  since  e(u)  - l-0(j(^).  How  ^ )^(l-J?  ) ; 

hence,  Bj"  Bj  and  implies  which  is  conclusion  (ii). 

With  these  conditions  that  B^^  B^,  attention  is  turned  to  the  relation 
between  the  condition  B^;^  B^  and  the  Kullbach-Leibler  information  numbers . 

The  mean  infc.rmation  per  observation  of  X for  discriminating  between 
and  H2  when  is  true  is  defined  by  Kullbach  and  Loibler,  [jl,  [6],  to 
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r 


i 


K 

t 


CD 


(z) 


j fl(x)log  d4'(x)  , fori“l, 

(®  f (z) 

Ij^(2?l)  - * for  i-2. 


ana 


- 00 


The  nean  divergence  beieeen  and  E2  per  observation  of  Z th^  then 
define  to  be 

(2.1.2)  Jj  - Ij(U2)+ Ij(2j1)  . 


12(1:2)  and  12(2:1)  will  be  referred  to  as  the  K-L  numbers  for  X.  The 
E-L  numbers  and  the  divergence  for  I are  similarly  defined. 

It  Is  noted  In  passing  that  if  the  distribution  of  Z is  of  the 

<u>.x 

exponential  type,  i.e.,  f j^(x)  = ^(cOj^)e  ^ , then 

6(<o,) 

(2.1.3)  Ij(lt2)  - log  . <‘^l-‘'^2>*^«’  > 

Thus,  J2  is  an  interesting  measnre  of  the  'distance*  betwesn  snd 
relative,  to  the  random  variable  Z,  being  the  product  of  two  often 
considered  measures • 


If  12(1:2) > Iy(l:2)  and  12(2:1) > ly(2:l),  one  would  say  that,  in 
the  Kullbach-Lelbler  sense,  Z is  the  more  informative.  The  question  that 
arises  Is  that  of  the  relation  between  being  more  Informative  in  the 
Kullbach-Lelbler  sense  and  being  more  informative  in  the  sense  of  uniformly 
smaller-  Bayas  risks.  It  will  bo  seen  in  the  remainder  of  this  section 
that  the  two  are  not  equivalent,  but  that  interesting  relations  do  exist. 
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Theorem  2.2.  Bj-Ry  ^pliefl  equality  of  ths  correepondlJig  K-L  numbera 
for  Z and  for  I. 


Proof.  Vlth  E and  F aa  defined  in  Lemma  2.1, 
® f,(x)  ® 


(1)  Ij(l*2)  - J fy(x)  log  “ “J  log  udE(u)  , and 

-00  ^ o 

Ij(l:2)  - A-  ) log  dM'(x)  ~ log  adF(tt)  . 

- 00  0 

By  Theorem  2.1,  E"f  and  henoe  Iy(l:2)  ~ Xy(li2). 

In  the  aame  way, 

00  y 00 

(2)  Ij(2:l)  - J f2(*)  log  "Jo  log  udE(u)  , and 


00 

00 


Iy(2:l)  - J g2^*)  1®2  ^ d'4'(x)  ” J “ udF(u)  . 

Het.ee,  Iy(2:l)  - Iy(2:l)  alao. 

Theorem  2.3«  If  then  the  E-L  numbers  for  X are  jester 

or  i)qual  to  the  corresponding  K-L  ncmbera  for  I. 

Proof.  Again  with  £ and  F as  defined  in  Lemma  2.1, 

rudE(u)  “ lim  f udE(a)  “ 11m  / f,(x)d>f  (x)  - 1 . 

7?->0B  / JJ-»00 


o 

0} 


(1) 


^->00 


f^(x) 


00 


i 


Similarly,  | adF(u)  1.  Henoe,  for  0 any  linear  function , 

00  CO 


(2) 


W W 

f Cf(u)dE(u)  - [ jf(u)dP(u;  . 


By  Lemma  2.1, 
(3) 


00  00 

j'  min(u-  ,0)dE(u)  5 J min(u->|  ,0)d?(a) 


-li- 


lt is  eaail^  aeen,  then,  that  for  any  ooneave  function,  0, 
00  00 
r .....  . r . 


U)  J 0(u)dE(u)  £ i i#(a)djr(a) 

In  particular,  for  0(u)  ■ leg  u, 

00  CO 

(5)  log  udE(u)  > - log  udF(u)  - Ij(l:2)  ; 

o o 

»hile  for  0(u)  = -u  log  u, 

r*  f 

(6)  -L,(2:l)  ■ - u log  udE(u)  i “I  u log  udF(u)  ■ -Iy(2:l)  . 


o o 

Equations  (5)  and  (6)  yield  the  conclusion  of  the  theorem. 

In  the  matter  of  converses  to  Theorems  2.2  and  2.3,  no  general 
theorems  were  obtained.  In  each  special  case  investigated,  equality  of 
the  corresponding  E-L  numbers  was  found  to  be  equivalent  to  equality  of 
the  risks,  but  a uniform  Inequality  of  the  E-L  numbers  failed  to  imply  a 
uniform  inequality  between  the  risks. 


2.2  The  Case  of  Normal  Distributiona.  Attention  is  now  turned  to 
the  particular  case  in  which  both  Z and  I have  normal  distributions  under 
each  hypothesis.  Since,  for  normal  distributions,  both  the  risk  function 
and  the  E-L  numbera  are  ■invariant  under  affine  transformations,  there  is 


no  loss  of  generality  in  treating  the  situation  given  by  the  follcwing 


diagram: 


I i 

S(0,1)  H(0,1) 

H2  H(|X,(T^)  H(m,v) 


2 

where  U.  > 0,  0,  and  (f 
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The  K-L  numbers  for  X are: 

2 

(2.2.1)  It(1s2)  - ^[log  (T^-1*-^  ♦ , and 

(T  (T' 

IjiZtl)  “ |[log  -!♦  (T^+  . 

Those  for  Y ars,  of  course,  the  same  with  the  obvious  substitutions. 
Theorem  2.Z..  The  following  three  statements  are  equivalent. 

(i) 

(ii) 


- B-. 

nj- 


(iii) 


Ir(l:2)  - Ij(l:2)  and  Ij(2:l)  - Iy(2:l), 
_2 

C “V  and  fJi  - a. 


Proof.  By  Tiieorem  2.2,  (i)  implies  (ii).  further,  (iii)  clearly 
implies  (i).  Benue,  it  is  necessary  only  to  show  that  (ii)  :^mplies  (iii). 
Assuming  (ii)  to  be  true,  then 


(1) 

and 

(2) 


log 


iZf  ^ . 1 . JL  . id? 

V 2 V V 2 

a <T 


log  * (T^  “V  - 


Suppose  (iii)  not  true,  in  particular,  that 

Case  I;  (f^>  1.  Multiplying  equation  (2)  by  and  adding  to 

2 


equation  (1/  it  is  found  that  jj.  is  of  the  same  sign  as 
(3)  A(  ~ ^v+l>  lOfir  sd. 


»▼)  " (v+l)  log  -1-  (7^  + V . 

r 

But  A(v,v)  ■ 0 and 


5 ki(T^,y)  - ((7^-v)(1-(7^)<  0.  Hence  u?<0, 

<?cr  (T  ' 


a clear  absurdity. 

Case  II ! Multiplying  equation  (2)  by and  adding 

2 CT"  ■' 

to  equation  (l)  it  is  seen  that  a'^  is  of  the  same  sign  as 
B(  (f^,v)  “ ( l)  log  ♦ 1 - V 


(4) 
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But  B(v,v)-  0 and  - B(  (T^ ,▼)  - (log  ^ - ^)-(log  which  ia 


negative,  since  log  x-x  la  strictly  decreasing  for  x^:!*  Hence,  a similar 

2 

contradiction  is  reacheds  n £ 0. 

In  either  ease,  It  must  be  concluded  that  if  (li)  is  true,  then 
{T^=v  and,  as  an  immediate  consequence-  m“  yU.. 

It  is  noted  in  passing  that  the  sane  line  of  argument  yields  the 
Corollary:  For  v<  Ij(l:2)2  Iy(ls2)  implies  that  I^(2il)  > I^(2:l) , 

while  for  v>  Ij(2:l)  > Ij(2:l)  implies  that  Ij(l:2)  > Iy(l:2) . 

For  a further  analysis  of  the  case  of  normal  distributions,  assume 
(7^  and  yLt  fixed,  (f^  > 1,  and  consider  the  (v,m^)  plane.  One  can 
immediately  determine  the  region  in  which  1^(1 :2) ^ Iy(l:2)  emd  that  in 
which  Ij(2:1)2  ‘ equations 

(2.2.1)  Il(ls2)-[^  }ly(l:2) 
if  and  only  if 

2 

(2.2.2)  m^  (T^ * -v  log  v-1  . 

Ix(2:l)  ^ ^ Iy(2:1)  if  and  only  if 

(2.2.3)  ~ * (T^  - log  (f^  - log  r - v . 

That  hj^(v)  5 (T^  with  equality  only  at  v“  (T^  is  a 

consequence  of  Theorem  2.4  and  corollary.  (It  can  be  shown  similarly 

p 

that  for  v>  (T  » h2(v)<h^(v)  for  all  v for  which  h2(v)2  0). 

Together  with  Theorem  2.3,  these  results  yield  the  result  that 
for  , 

(2.2.4)  I : Rj  1 Ej  j t s 


I 
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To  Investigate  more  fully  the  relation  of  these  two  sets,  in  particular, 
to  see  if,  perchance,  they  are  equal,  the  risk  functions  must  be  computed. 
From  this  point  on^both  v and  (T**  mill  be  assumed  to  be  greater  than  1. 
For  the  particular  case  under  consideration,  the  probabilities  of  the  two 
types  of  errors  when  using  the  Bayes  procedure  against  based  on  X, 
odw  and  /S-r,  are  easily  computed. 

(2.2.5)  = Pr(l2-  > 2 log  3^  1 H^) 


" 1-Pr(|x*  I ^ y » 

where  ^ and  it  is  to  bo  understood  that  Cs^(?^)“l  if 

(Cr^- l)log  rf<r^  < 0. 

(2.2.6)  - Pr(l^-  (^)^  < 2 log^  I ^2)  . 

Since  the  distribution  of  under  is  the  sane  as  that  of  I under 

H^,  (2.2.6)  can  be  expressed  as 

(2.2.7)  ^j(^)  - (T^~l)log  ^^0^  I %)  > 


where  again  and  it  Is  to  bo  understood  that  y^j(^  )“  0 whenever 

jJ^*  ( (T^-i)iog  < 0. 

Since  Rr(^  ) * t^C<j(  ^ )*(l-'2j  )y<^j(  ),  it  is  seen  that  for 

p5+(  (T^-l)log  (T^yf'  < 0»  ^^  ^ ) “ ^ computation  of 

and  will  clearly  yield  the  same  expressions  as  (2.2.5)  and  (2.2.7) 

2 ^ 

with  the  obvious  substitutions  of  parameters.  Thus,  for  m +(v-l)log  >^“v  ^ 0, 
Ry(^)  - 


( 


f 


- 15  - 


Lamafl  2.3.  d|"  1 (T^)- 


Proof.  It  is  first  shosn  that 


- odi(^)-^l(ip)  . 


Setting  A-  V iJ*(<J^-l)log 


d1^ 


1“  S )A 


izif 


.(-  B .)2_i  „ jLidk  . JdJHA 

V-l  "(cr^-D^e^-l.e 


.1  (j±£.)2^i  ,.a:_  m.(ta  , -tiO 

^v-i  ^ ^-1  ,:^-i . , 


Prom  (2)  and  (3)  one  obtains , after  some  simplifications,  that 


JL£A  . a=£A  f 


iog?|(f  _ log  T^tr 

(T^-i  Ml- 5). 


r 

SlJice  bracketed  quantity  can  be  written  as 

( - ^ .logr^  - ’) 

(5)  r:^  i -9  ♦ e ' f - 0 . 

1-.,  I I 

Thus  L 

d|-  V5>  ■ 5^  (1-!:;) 

2 

Now  if  log  ■*■  i-S  then  + ((T^ -l)log  yj^  > 0 

and  therefore  both  o^2(^)<l  and  ^ ) > 0,  which  establishes  the  lemma. 
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The  preceding  proof  that  this  special 

case  is  Included  as  it  illustrates  a situation  in  which  certain  terms  are 

shown  to  he  zero.  Similar  situations  will  arise  later  and  the  method  here 

used  will  be  referred  to.  The  fact  that  the  derivative  of  the  risk  curve 

is  so  related  to  and  ^ is,  however,  a verv  general  resulb  for  statistical 

/ 

games  with  two  states  of  nature,  two  actions,  and  a 0 or  1 loss  function. 

The  fact  is  essentially  demonstrated  by  Blackwell  and  Girshlck  ([3], 

Section  6.3)  and  from  their  discussion  it  is  clear  that  a rigorous  proof 

can  easily  be  given  the  proposition  that  ) = oC{^)-  jSi  whenever 

the  left  member  ezists  (as  it  does  almost  everywhere). 

Lemma  2.3,  and  Its  analogue  for  Y,  show  that 

2 

^ if  log  £ -(-J^  ♦ log  (T^)  , 

<Zf  if  log  + log  . 

And 

, _2 


log  5 ♦ log  v)  , 

2 j|2 

log  + log  v)  . 


From  (2.2.8)  it  is  clear  that  a necessary  condition  that  is  that 


(T^-1 


_ tC 

* log  + log  V , or 


(2.2.9) 


^ i R-j(t)  ■ (v-l)(-^  ♦ log  -^) 

^ (T-1 


As  a consequence  of  Theorem  2.3,  it  must  be  true  that  h,(v)£lL,(v) 
for  l£V£  (T^.  But  it  is  easily  verified  by  differentiation  of 
that  equality  holds  only  for  v*  (T^.  Thus,  any  pair  (v,m^)  with 
h,^(v)<  m < h^{v)  provides  an  example  in  which  X is  more  informative  in 


} 
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the  sense  of  the  Kullbach-Laibler  Information  numbers  bat  falls 

to  hold. 

Thus  far  the  results  have  been  necessary  conditions  that  R^. 

The  most  restrictive  of  these  is  that  o ih^(v).  The  principal  result 
of  the  remainder  of  this  section  will  be  that  m £h^(v)  is  also  a sufficient 
condition  that  R^  ^ R^. 

Lenrnia  2.4-  For  fixed  v>  1,  ) is  a non-increasing  function  of 

m for  each  "tf  . 

Proof.  From  the  expressions  already  derived  for 
iH?  r~^  ; ^ 

follows  that,  with  L=  ^ m‘'+(v-i}log  7^  v , 


▼-1^^  _lt^  -it^ 

e ^ dt+  f e ^ dt  , 


(1)  J e ^ dt+  f 

■ A ^ 


■nS 

-L 


*L 


/ 


Let  a^b  denote  that  a and  b are  of  the  same  sign.  Then, 


asL  d 


L _Ju/+_a3L)2  _1 

) frl  2v  v-1^  2 

i J e ->^  s 

•L 


]dt  , 


.-LfT_J!Z.\2  _1/t_J5_)2  _i/T,.JHL)2 

^ (JL.  2v^^  v-1^  2^^  v-1^  _1_  2v^^  v-1^ 

V ins'  ® 


<?m 


J .JL 


-Yi  e 


) 


/_!_  « 2v  v-1^  „ ^ 2 v-1^ 

e - 7^  e 


)dt  . 


I 


i.rt.'ir 
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By  a reductioQ  paralleling  that  uaed  in  the  proof  of  Lemma  2.3,  the  firat 

$rni  cl  i.£Sw  mCwCCr  OT  %«•/  xt>  xvuaxCa  vw  xoxw ^ wiixxo  xx  ouo  jxluxwavou 

differentiation  in  the  sesond  term  is  carried  out,  the  resulting  Integral 
can  he  evaluated  and  one  finds  that 


- -rjie  - e ! • 

Multiplying  the  first  term  on  the  right  by  1 = , (3^  can  be 


V V 


reduced  to 


Since  the  remaining  terms  reduce  to  zero  in  the  same  way  in  which  the 
first  term  of  (2)  did  so.  Since  L^  0,  it  follows  from  (4)  that 
Rj(  tf)  ^ 0 and  the  proof  is  done. 

It  can  now  be  concluded  that  there  are  two  non-negative  single- 
valued functions,  say  and  /zJgj  ▼>  li^i  (T^>  such  that  for 
m*i^fj^(v),  5 and  for  vl  > Rj  ^ Rj.  The  ^ssibility  that 

0-^=  0 or  that  02^*oo  is  not  at  this  point  excluded. 

Let  0 be  a non-negative,  differentiable  function  of  v (v>l)  with 
0(CJ^)  - fi?.  Now  set  m=  0(v)  and  consider  Rj(  ) as  a function  of  v. 
From  equation  (l)  in  the  proof  of  Lemma  2.4, 

I 1 M^)  f 1 -i(t-vC)^  } -|(t-C)^ 

(2.2.10)  ^(2'Tr  (-fr^  - ^)  = J e dt-y  J « dt 

“L  * “L 


L 1 -.2 

I e dt  , 


where 


C “ f0(^  , 


t 
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L - ^^+(v-l)log  “ a)V  Jc^* 

Differentiating  with  respect  to  v yields  the  aquation 

(2.2.112__, 

1-t^  av  Vv^  ^ ® 7 

*3#'»  -T*  ’ 


■J 


-^(t-vC)^  -J(t-C)2 


The  first  tern  of  the  right  member  reduces  to  aero  as  in  the  preceding 
proofs.  Then,  carrying  out  the  differentiation  indicated  in  the  last 


term  and  rearranging, 

(2.2.12) 


-i(t-vC)2 


^ (t-C)^ 

f /a  m \ ^ ^ J 


(t-C)e 


L 

. r -4— 

2v  ifT’ 


2 ? ? (t-vC)^ 

(t^-v^C^-v)e  dt  . 


Evaluating  the  integrals  in  the  first  term  and  preceding  in  the  same 
manner  as  in  going  from  (3)  to  (4)  in  the  proof  of  Lemma  2.4,  one  has, 

JMl  4^  z,(r  ) ^ ^ 1=1 


+ — * I (t-v'^C-v)< 

Zyfy'  ^ 


r 


•t 


(2.2.13) 
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l2^V 


r -LC  LC, 
[e  - e ] 


, r 9 9 9 (t-vC)^ 

2v^fT  i 


^ . (Y-l)<^»{Y)-2Cf(Y) 

2(v-l)^/f0(^ 

The  second  term  of  the  right  aeaber  of  (2.2.13),  the  Integral,  is 

negative  for  all  i.e.,  for  all  0,  since  for  small  L the  integrand 

is  negative  and  as  L increases  beyond  the  point  at  which  the  maximum  value 

of  the  integrand  is  zero,  the  value  of  the  Integral  Increases  monotonioally 

to  a limit  whose  value  is  easily  found  by  a direct  integration  to  be  zero. 

dC 

It  may  be  noted  at  this  point  that  for  ^ ” 0,  i.e.,  for 
J^(v)  ■ — ^ X (v-l)^,  the  derivative  negative  for  all  C. 

((T-ir  ^ 

Combined  with  Lemma  2.4  this  yields 

2 

(i)  R-  i Rv  for  l^Vi  (T^  and  m^  (v-l)^  , 

^ " (cT-ir 


(ii)  R^  2 ^ ^ <r^  ^ 


(v-l)^  . 


However,  let  0 be  the  function  h^  defined  by  (2.2.10).  It  ip  asserted 

that  for  this  choice  of  0 the  right  member  of  (2.2.13)  i®  less  than  or 

equal  to  zero  for  all  L>0.  To  show  this,  note  first  that  with  0 = h^, 

2 

HO  1 ^ vO** 

dv  ~ ” 2v(^v-l?C  * consider  the  right  member  of  (2.2.13)  as  a function, 

of  L for  L > 0.  <4^0)-  '^(♦oo)  ■ 0.  Thus,  to  show  that  is  negative 

for  all  L,  it  will  suffice  to  show  that  there  is  an  L'  such  that  (^'(L)^0 
for  1<  L'  and  '|''(L)>0  for  all  L^L’. 


I 
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(2.2.U)  ^'(L)  - e [|^  (v-1)  ^(L-vC)e^“-(L^7C)'^ 

.l£=I^(e2LC.l)] 

. 112£ 

Denote  this  last  expression  In  (2.2.14)  h^  7f(D).  Then 


^ ' (L)  - fe^^ri  * 2C(1-  3+1  , 


(2.2.15)  y"(L)  - 4C^e^^(I^vC) 

Rrom  (2.2.15)  one  sees,  then,  that  ^"(L)  is  negative  for  L<  vC  and 

is  positive  for  L>vG.  Hence,  ^ is  concave  on  the  interval  (0,vC)  and 

convex  on  the  interval  (vC, +00).  But  ^(O)  ■ 0 and  ^(+0)  * +00.  Hence 

there  is  an  L'  such  that  and  therefore  ''r’' , is  negative  for  L<  L' 

and  positive  for  L>L'.  In  this  wzj  the  proof  is  complete  for  the 

2 

Lemma  2.5.  For  m -h^(v),  v>  1,  RyOlp)  is  for  each  7^  a non- increasing 
fxmctlon  0:!  V. 

Combining  the  results  of  Lemmas  2.3,  2.4,  and  2.5,  it  is  seen  that 
the  following  theorem,  giving  restricted  necessary  and  sufficient  conditions 
for  uniform  inequalities  between  the  risks,  holds. 


i 
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Theorem  2.5.  For 

■TCv,!!^)  i Hj  i t ^ h^(v)\  . 

2 

For  (T  > If 

•^(v,m^)  J Rj  > Rj  “ •|^(v,m^)  ; m^  > 

It  would  at  this  point  be  pleasant  If  a choice  of  0 could  be  made  such 

o 

that  for  m ■ 0('r) , ▼ > Ij  Ry(</  ) "ould  be  for  each  a non- deer  easing 

function  of  ▼,  i.e.,  such  that  > 0 for  all  L > 0.  Tso  nseessarF 

conditions  for  such  to  be  the  case  are  immediate,  namelF,  4^'(L)^  0 for 

all  sufficiently  large  L,  and  < 0*  But  from  (2.2.14), 


2 _2„2 


vV«(L)  ^ (Y-l)[(I^vC)e^^-(L^vC)  V (e^^^L)  . 


2v 


Now  let  - ^ (v“l)"P>0.  Then, 


. ,2i£ 


P(L+wC)  , 


and  for  given  P and  v this  becomes  and  remains  positive  as  L increases. 
Hence,  one  cannot  find  a curve  along  which  R^( is  for  each  ^ non- 
decreasing in  V,  except  the  degenerate  case  v“  (T^,  where  Rj(l^)  is 
uniformly  (in  non-decreasing  as  m decreases. 

Now  by  Lemma  2.3  there  is  a function,  call  it  h^,  such  that  for 
(T^* 

■^(v,m^)  : Rj  ^ Rj^  - |^(v,m^)  : > H^(^)  ^ 

From  the  preceding  paragraph  it  follows  that  in  general  h^/h2,  since 

o 

for  any  point  (v^,m^)  let  Ry  be  the  associated  risk  curve  (in  the 


t 
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obvious  way)  and  let  B(vj^,ni^)  ■ (v,o~)  * ^ | • It  is  asserted  that 

the  lower  boundary  of  B(vj^,i^)  does  not  in  general  coincide  with  the  line 

2 2 

of  constant  l(2:l)  through  (vj^,a^).  Suppose  it  did.  Let  (vj^,m^)  lie  on 
the  curve  of  hg,  l<v^^<  (T^.  Then  for  (v,n^)  also  on  the  curve  of  b2  and 
l£  v<  Vj^,  one  would  have  Rj  > > Rj.  But  this  would  imply  that  for 

each  Kji'Cf)  la  non- increasing  in  v,  for  m =»h2(v).  Since  this  has 

bssn  just  shown  to  be  impossible,  and  h^  > h2  according  to  Theorem  2.3, 

it  must  be  concluded  that  h^  :>  h2  for  l^v^  (J^ . 

Uany  of  the  interesting  results  of  this  section  can  be  summarized 

2 

in  the  following  way.  For  liV£(j"  , the  four  functions  h^^,  h.^,  h^, 

2 

and  h,  determine  five  seta:  for  m ^ h.,  X is  more  informative  than  7 

4 J 

both  in  the  Kullbach-Lelbler  sense  and  in  the  sense  that  Rj  R^;  for 

2 

h^  < m s hj^j  I is  *Be  more  informative  only  in  the  Kullbach-Lelbler 
2 

sensej  for  h^  < m < lu,  neither  random  variable  is  the  more  informative 

2 

in  either  sense;  for  S ■ < 1>4.  I is  the  more  informative  in  the 

2 

Knllbach-Leibler  sense  only;  and  for  h^  ^ m , 7 is  the  more  informative 

in  both  senses . From  the  results  and  methods  of  this  section  it  can  be 

verified  that  if  v>  (T  >1,  then  for  m i h^,  X is  the  more  informative 

2 - 

in  each  sense;  for  h^  < m ^ h2,  4 is  the  more  informative  in  the  Kullbach- 

2 

Lelbler  sense  only;  for  h2<m  <h^,  neither  is  Jiore  informative  in  either 
2 

sense;  for  h^  ^ m < h^,  7 is  the  more  informative  in  the  Kullbach-Lelbler 

2 

sense  only,  and  for  h^  ^ m , 7 is  the  more  informative  in  both  eensos.  The 
function  h,  has  not  as  yet  been  explicitly  given. 


i 


I 
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2.3  I-L 


Attsstion  i«  nsrt  turned  to  another  and  Interesting  point  of  view 

with  regard  to  the  E-L  Information  numbers.  Suppose  that  the  densities 

under  consideration  are  elements  of  a class,  / f^:  of  densities 

positive  on  the  same  set.  Assume  that  there  Is  an  Abelian  group,  T,  of 

transformations  of  the  domain  of  the  aud  a corresponding  group,  T, 

of  transformations  of  O.  such  that  If  I has  densltj  then  for  tCT, 

t(X)  has  density  f “ u(t  that  Is,  d^(t  ^x)  ■ u(t  ^)d‘f(x). 

t(o>)  ' 

Finally,  assume  that  given  and  co^  In  O,  there  Is  a t£T  such  that 
- t(£0'^). 

Theorem  2.6.  The  K-L  numbers  are  functions  only  of  the  transformation 
that  carries  f^^  Into  £2  and  not  of  f^^  and  £2  Indlvldtially. 

Proof.  Choose  a t£T  such  that  f2(x)  ■ fj^(t”^x)|L4.(t”^) . Then 

r . . f.W 


1(1:2)  - f,(x)  log  ^ d4^(x) 


on  ^ 

-log  U.(t"^)+  r f,  (x)  log — - — z — d^(x)  . 

-L 


To  show  that  the  value  of  the  integral  is  a function  of  t only  cmd  does 
not  depend  on  £^,  choose  any  f^e^f^^  and  Itit  fj^(x) f^(s”^x)  p.(e"^). 
Then,  with  j-  s ^x,  t V"  ^ s H ^x,  and  (l)  can  be  rewritten  as 


-1  F ^ (y) 

1(1:2)  * -log  u,(t”'*')  + f (y)log — ^ — . — d'l'(y) 

y_  f (fV) 


A similar  proof  holds  for  l(2:l)  and  the  proof  Is  complete. 


( 
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Whenever  it  happens  that  equality  of  the  K-L  nuiahers  for  I and  for  I 

implies  that  the  same  transformation  that  carries  f,  into  f,  also  carries 

g,  into  g-j  it  will  follow  elao  that  for  some  t£l,  I and  t(l)  have  the 

d 

same  distributions  under  each  hypothesis;  this  will  be  denoted  I ■ t(l). 

In  such  cases  it  is  clear  that  equality  of  S-L  numbers  Implies  equality 
of  risks.  That  all  the  conditions  on  the  group  T given  above  are  not 
necessary  for  equality  of  the  K-L  numbers  to  imply  equality  of  risks  appears 
Immediately  from  the  case  of  normal  distributions  where  the  group  is  not 
Abelian,  the  'Jacobians'  are  not  constants,  and  the  correspondence  between 
transformations  and  K-L  numbers  is  not  1-1  but  still,  equality  of  K-L 
numbers  implies  equality  of  risks. 

Lemma  2.6.  I ^ t(l)  implies  and  if  the  likelihood  ratios, 

®2'^®1’  ®o*i®'^ne  in  the  same  direction,  then  Rj*  Ey  implies 

that  X ■ t(X). 

Proof.  The  first  statement  is  clear.  Without  loss  of  generality, 
let  Z and  Y have  the  common  density  h under  end  densities  f and  g 
respectively  under  It  then  suffices  to  show  that  f“  g,  for  then  the 
same  transformation  that  carries  h into  f carries  h into  g. 

From  Theorem  2.1,  if  Ry-  Ry,  then 

(1) 


J h(x)d'|''(x)  - r h(x)d^(x)  for  all  •>^ 

hli) 


J 
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SiK.se  all  densities  are  aaaumed  positive  together,  it  follows  from  (l) 
that  Hence 

(2)  : f(x)  ^ ^ h(x)  j-  - ^x  : g(x)  i 7^  h(x)  j' 

If  then  an  7^  can  be  found  suoh  that  f(x^)  < >|h(x^)  < g(x^)j 

contradicting  (2).  If  then  a siailar  contradiction  arises. 

Therefore,  f ” g. 

As  an  example  of  such  a class  of  densities  and  transformations  as  is 
being  considered  in  this  section,  consider  the  f* -distributions? 


(2.3.1) 


y*> 


^-1  ^-<OX 


(caO  > o) 


with  T- 

Suppose  the  density  of  I has  parameter  and  that  of  I has 
parameter  under  Let  <^2“ 

oX  CO  -c/)  . 

(2.3.2)  Ij(l;2)  - ocClog - oCClog^ -1*  a]  ; 


and 


Ij(2:l)  = oC[log  = 0C[iog  a - 1+ J]  . 

Equations  (2.3.2)  give  the  K-L  numbers  explicitly  as  functions  of  a, 

where  a corresponds  to  the  transformation  carrying  f into  f . . If 

1 2 

A2*bXj^,  then  the  expressions  for  the  K-L  numbers  for  I are  given  by 

(2.3.2)  with  a replaced  by  b.  In  the  question  of  equality  of  information 
numbers,  then, 

(2.3.3)  Ij(l:2)  - Ij(l;2)  if  and  only  If  log  ~ - Va  ; 

and  - . 

Ij(2;l)  = Iy(2;l)  if  end  only  if  log  f = ^ " a ’ 


«■ 


\ 


I 


I 


Equality  of  iaformation  numbers  in  this  case  implies  that  ab(b-a)“b^a 
and  hence  that  either  (i)  a-b,  or  (ii)  ab-  1.  If  (ii)  but  not  (i) 
holds,  then  log  b - b ■ log  ^ ^ • This  can  easily  be  shown  not  to  be 

true  for  b^l.  Therefore  a“b  and  an  example  is  provided  in  which  the 
relation  between  the  E-L  numbers  and  the  group  of  transformations  is  1-1. 
Also  note  that  if  the  K-L  numbers  are  equal  then  for  some  c>  0,  caij^ 

and  A^^cCC^,  i.e.,  I and  eZ  have  the  same  distribution. 

3.  Designs  for  a Binondal  Testing  Problem. 

3.1.  The  Problem.  In  this  section  consideration  is  given  to  specific 
design  problems  in  which  the  random  variables  have  binomial  distributions. 
Again  it  is  supposed  that  there  are  two  hypotheses,  and  H2,  with  2^ 
the  a priori  probability  that  is  the  true  hypothesis,  and  that  one 
must  decide  which  of  the  two  hypotheses  is  true  with  a loss  of  one  if  the 
decision  is  incorrect  and  no  loss  if  it  is  correct.  There  are  available 
two  random  variables,  X and  I,  having  binomial  distributions  with 
parameters  p and  q,  respectively,  under  and  parameters  q and  p, 
respectively,  under  H2. 


(^ ) P q 

(1-2P)  q p 

Suppose  that  the  observations  are  Independent,  the  total  number  of 
observations  to  be  tak^n,  n,  is  fixed,  and  that  the  cost  of  observations 
is  Independent  of  the  true  hypothesis,  the  random  variable  observed,  and 
of  the  result  of  the  observation. 


i 
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The  first  problem  considered  le  that  of  non- sequential  designs, 
before  isxperlmentatlon  It  must  be  decided  which  of  the  observations  are 
to  be  of  X and  which  of  I. 

The  second  problem  treated  Is  that  of  sequential  designs,  l.e., 
rules  which  for  each  j < n tell  one,  as  a function  of  the  information 
available  after  the  experiment,  which  random  variable  to  observe 

a t 

on  the  J+1  experiment. 

In  each  case  the  principle  of  choice  among  possible  designs  is, 
of  course,  that  of  minimizing  the  Bayes  risk. 


3.2.  Non- sequential  Deslms.  Since  the  observations  are  assumed  to 
be  Independent,  the  non- sequential  design  problem  reduces  to  determining 
for  each  the  optimum  number  of  observations  of  X. 

Let  Rj.C't/)  denote  the  risk  against  ^ If  X is  observed  r times  and 
I n-r  times.  Assume  for  definiteness,  and  without  loss  of  generality, 
that  p>q  and  note  that  by  the  evident  symmetry,  ) “ E^_^(l- . 

Furthermore,  there  Is  no  loss  of  generality  If  it  is  assumed  that 
p(l-p)  > q(l-q),  for  if  not,  one  would,  by  interchanging  p and  1-p,  q and 
1-q,  and  X and  T,  find  oneself  In  the  assumed  case. 

As  before,  it  will  be  convenient  to  consider  rather  than 

Itself  much  of  the  time. 

For  general  n,  the  solution  is  characterized  by  a division  of  the 
interval  [0,1]  into  intervals  with  the  property  that  for  in  a given 
interval  a certain  number  of  observations  of  X,  l.e.,  a certain  value  of 
r,  is  optimal.  In  some  of  the  intervals  the  optimum  value  of  r is  not 
unique. 
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The  general  equation  for  the  value  of  is  given  hy 


(3.2.1)  R^(^)  - 


q^-^(l-q)S-'‘“^‘(l-p)^-(l- 


The  preceding  characterization  of  the  solution  follows  from  the  fact 
that  R^  is  piecewise  linear  for  each  value  of  r. 

The  turning  points  of  R^  occur  for  those  for  which  the  two  quantities 
whose  minimum  is  taken  in  (3.2.I)  are  equal,  i.e.,  for 


\ 

Since  < 1,  the  first  turning  point  occurs 


for  that  k and  1 which  maximize  k-2i,  which  is  k~  n>r  and  i~  0.  Thus  the 

first  turning  point  of  R is  (•)  , which  is  a decreasing  function 

r p i-q 

of  r.  For  all  such  that  7^  < . 

The  functions  R^  can  now  ho  compared  for  small  or  equivalently, 
for  small  7^  . 

(3.2.2)  Ry(^)  for  all  r and  for  s (p)  , 

IL(  C ) < 6 ") 

, V for  (?)°  < ^ < 

torronj  " ( P H 

Thus  there  is  complets  indifference  for  0 < >?  < (®)  and  (^)*^  is  the 

I P P 

left  end  point  of  an  interval  in  which  the  unique  optinnw  value  of  r is  n. 

To  push  the  analysis  a bit  further  along  the  >j  axis,  consider  the 
equations  giving  the  second  segments  of  R^. 

(3.2.3)  Ry(^J  - q'(l-p)“"^^  ^tl-p'(l-q)”’'- q'(l-p)“"'l  . 

The  intersection  of  these  lines,  for  r<n,  with  that  for  r*  n occur  at 


/ 
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(3.2.4) 


\n-r  n-r 


It  t /,  \n-r  n-r 
' p U-q)  - p 


Now  setting  t- n-r-1,  ^ 

/,  \t+l  t+1  ^ /•,  \t  t 

fo  2 <5)  (i~p)  ~ q . a (1-p)  - q 

Since  l-q>  p,  both  denominators  are  positive  and  one  can  obtain  the 
equivalent  relation 

[ (l-p)*+(l-p)*“^q+ . . .♦q*lC (l-q)*~^p*(l-q)*“^p+ . . .*p*] 

> [(l-q)*+(l-q)^^p+...+p*][(l“p)*“^q*(l-p)*~^q^+...+q*3  . 

Since  adding  (l-q)^  ^ "(l-p)^  ^q^  to  the  left  side  and  (l-p)*  ^ \ (l-q)* 

1.  ..  1'.  t.  » A 


. k-0 


to  the  right  side  will  yield  an  identity,  i®  equivalent  to 

(l-p)*  ^(l-q)‘-V-  (1-P)*'V  > 0 . 


This  in  txirn  can  be  written  as 


(3.2.6)  ^ (l-p)*"^(l-q)*"*^[(l-p)V-  (l-q)V3  > 0 . 

But  (3.2,6)  clearly  holds,  as  p(l-p)  > q(l-q) , and  hence,  i®  strictly 


decreasing  in  r. 


The  intersection  of  the  second  segment  of  R with  that  of  R . is  at 
n 1 ' ^ n~x 

“ (p)  > (p)  . And  since  the  second  turning  point  of  occurs  at 


> *^r  ®n-l  ^-®  a*  7 ” ^ 7n-l* 

solution  for  a somewhat  larger  range  of  can  be  given. 


f 


“i»asi  Auct. 


- JX  “ 


(3.2.7) 


For  0<  f^< 
for  (J)“<  :^  < (J 
and  for  (p)*^ 


there  is  indifference  between  all  r'a; 

n is  the  optliaal  Tslns  for  r| 

>7  < ti  n-1  is  the  optlaal  ralue  for  r. 


The  value  of  u in  (3.2.?)  is  dependent  upon  conditions  of  the  form 
p«(l-pr|;jq=(l-q)*. 

Not  only  does  the  value  of  u vary  with  eases,  but  the  optimal  value 
of  r in  the  interval  having  u as  its  left  end  point  will  also  vary  with 
eases.  It  would  appear  that  in  the  attempt  to  gain  a complete  solution, 
one  shortly  becomes  bogged  down  in  a morass  of  special  eases. 

Certain  solutions  for  small  values  of  n were  computed  and  are  given 
below  as  they  appear  in  relation  to  the  7^-axia  for  0<  ^<1,  which 
corresponds  to  0<  ^ < l/2.  By  the  symmetry  about  Cf"  l/2  noted  before, 
the  solution  for  all  Xf  oan  easily  be  determined  from  these.  (I  denotes 
indifference) . 

B.Z-1 

optimal  r»  I ; 1 ; 

7 0 a 1 

p 

n “ 2;  1 < (l-p)^+(l-q)^: 


optimal  r ; I ; 2:1 i 2 ; 

7 0 /£^2  a fl  A.tP=fl  1 

p p p i-p*q 

1 > (l-p)^+  (l-q)^: 


optimal  r : I ; 2 : 1 ? 0 : 

7 0 /q^2  a hB  1 

p 1-q 


) 


- 32  - 


3:  (l-p)^-(l-q)^  < 1-^  : 


cptlmal  r j I 


«,2 


flisr 

pTi^ 


or 


optimal  r : I ; 3 ; 

7 0 (5)3  (?)2 


; 3 -»  ^--2. 


P pu-q 
2 


gji-p;  . q(l-g) 

(1-1 


(^)  * P^i-p) 

according  as  ^q).~.q(l.p)-  jg  greater  or  less  than  (t^) 

p^(3-2p)-p(l-q)^ 


1 

-ia»2 


1-^  < (l-p)^+(l-q)^  < !♦  pq  : 


optimal  r :I  : 3 ; 2:3  t Z t 

T <?)'  * i 

l*pq  < (i-p)^+(l-q)^  < 1+  if^q  • 


optimal  r ;I  ; 3 ; 2;3  ;2;0; 

.here  0 = l-(lza)!-(l-p)^-2w(l-p)  . 

l-(l-p)^-(l-q)^-2pq(l-q) 


These  are  the  solutions  for  nhat  appears  to  be  about  half  of  the  cases 
for  n=3. 

Thus,  for  small  values  of  7^  the  solution  ha«  been  found  for  all  cases, 
while  for  the  remaining  77 's  there  is  no  apparent  pattern  and  the  solutions 


- 33  - 


! 


! 


f 

[' 

I 

G 

f 

F 

I 

? 

r 

1 

-( 

i 

t 

i 

\ 

{ 

5 

{ 

» 

i 

f 

i 


t 


} 

i . 


(to  say  nothing  of  their  computation)  even  for  small  n lead  one  to  the 
conclusion  that  it  is  just  about  hopeless  to  aeek  a complete  general 
solution.  It  should  be  noted  that  the  symmetric  choice  of  the  parameters 
above  is  clearly  a help  rather  than  a hlnderance;  nearly  any  choice  of 
parameters  will  yield  a similar  morass  of  cases.  The  exceptions  are 
those  choices  of  the  parameters  for  which,  for  n”  1,  or  R^>R^. 

In  such  cases,  the  optimum  value  of  r is  zero  or  n,  rsspectively,  for 
all  n and  ^ . 


3.3  Seouential  Designs.  Suppose  that  there  is  a total  of  n experiments 

to  be  performed,  or  observations  to  be  taken.  Let  denote  the  fl  priori 

probability  that  is  true  and  the  a posteriori  probability  after 

having  observed  the  results  of  the  first  J experiments.  How  to  obtain 

the  optimal  sequential  design  one  must  decide  after  the  observation, 

as  a function  of  the  Information  obtained  in  the  previous  experiments, 

which  is  contained  in  and  the  number  of  observations  remaining, 

s t 

n-j,  whether  to  observe  Z or  I on  the  J*^l  experiment. 

Let  be  the  Bayes  risk  if  the  optimal  sequential  design  for 

n experlmeiits  were  used.  If,  now,  n'^1  experiments  were  contemplated 
and  X were  observed  first,  then  the  optimal  design  followed  for  the  remaining 
n experiments,  the  risk  would  be 

(3.3.1)  g(Z,n,'J^)  = ^n^p^-^^1- 

7>( -a-q) (1- C) > . 


If  I observed  first  and  then  the  optimal  design  followed  for  the 

remaining  n steps,  the  risk  would  be 


I 


( 


(3.3.2)  sd.-.?)  ■ 


L-al^ 


Hence,  the  following  functional  equation  Is  obtained. 


(3.3o) 


^n+i^O  “ mln(g(l,n,  t^),g(T,n,  6)) 


The  design  problem  Is  to  determine  those  ^ for  which  g(l,n,^}<  g(I,n, 
g(X,n,  ” g(T,n,C- )>  and  g(X,n,  ^)  > g(T,n,  t;),  respectively.  If  there 
were  n+1  observations  remaining  to  be  taken,  then  for  ^ in  the  first 
set,  X should  be  observed  next,  for  in  the  third  set,  T should  be 
observed  next,  while  for  ^ In  the  second  set  there  is  indifference  between 
X and  I,  since  one  would  do  equally  well  starting  with  either. 

For  n*  1,  the  sequential  and  non- sequential  dee  > '.uc  coincide  and  f^ 
is  easily  found.  In  theory,  one  can  then,  by  use  of  the  equation  (3*3*3)» 
compute  f^  for  any  n.  This  method  is  so  complicated  as  to  be  practically  pro- 
hibitive. A method  is  given  below  for  obtaining  the  sequential  designs  without 
having  to  compute  each  of  the  risk  functions.  Sven  this  method  bogs  down  in  cases 
as  n increases  f llCTTSVCiT  j for  given  values  of  p and  q,  it  would  be  possible  to  use. 

Now  it  is  clear  that  f^  is  piecewise  linear  and  it  is  concave.  It  is 
easily  seen  then  that  both  g(X,l,(^)  and  g(I,l,'?;^),  and  therefore  f2,  are 
also  piecewise  linear  and  concave.  Ftirthermore,  the  turning  points  of 


g(X,n, are  precisely  those  ^ such  that  either 


(l-p)CM^q)(l~^  ^ is  a turning  point  of  f^^.  Likewise,  the  turning 
points  of  g(l,n,  tf ) are  those  Cf  such  that  either  ^ or 

{l-  C ) ® turning  point  of  f^^.  In  terms  of  the  variable  >7, 

is  a turning  point  of  g(X,n,'Z/ ) if  and  only  if  ^ or  is  a 


t 


S - 


turning  point  of  f^^  and  is  a turning  point  of  g(I,n,  ij.)  if  and  only  if 
or  is  a turning  point  of  f^. 

For  n"  1,  the  solution  can  bo  expressed  by  diagram. 

optimal  choice  8 I : I t 

'I  0 a 1 

Since  the  same  kind  of  symmetry  about  ^ ” l/2  is  present  as  was  noted  in 
the  preceding  section,  to  give  the  solution  up  to  ~ 1 is  sufficient. 
The  turning  points  of  f^^  are  (in  terms  of  5^  ),  q/p,  1,  and  p/q. 

Arranging  the  turning  points  of  g(X,n,^)  and  g(l,n,  tf)  in  order, 
one  has,  for  q(l-q)  < p(l-p)  , 


for  g(l,n, 
for  g(T,n,  Xf)' 


ii>; 


(1^ 


(fl) 

ifil bU-pI 


aU:i 

p 


l-q 


2 2 

while  if  q(l-q)  >p(l”p)  , one  has  the  turning  points 


for  g(X,n, ^ ): 
for  g(l,n,2^ ): 


a 

P 


(1-p) 


(1^ 


p 


Izjk 

l-q 


1 

1 


In  each  case,  these  turning  points  divide  the  interval  (0,l)  into 

sub- intervals.  If  77  ^ and  Z is  observed,  then  — i^)  will 

^ a^(l-D}  ^ '*■”‘^1 

be  less  than  q/p  or  less  than  according  as  the  observed  value 

P (l-q) 

of  X is  1 or  0.  Since  in  either  case  ">Jj^iq/p,  it  would  be  optimal  to 
observe  I at  the  next  stage.  Similarly,  if  I were  observed  first,  then 
1 regardless  of  which  value  I assumed.  Hence,  it  would  then  be 
optimal  to  observe  X at  the  next  stage.  Now,  since  for  index?endent 
observations  the  order  is  immaterial,  the  two  risks  must  coincide  for 
^ ^ (q/p)  • 8 11--  manner  it  is  found  that  in  each  of  the  two  eases 

which  are  distinguished  by  the  ordering  of  the  turning  points,  the  interval 
whose  left  end  point  is  q/p  is  also  cn  intorvnl  uf  indifference.  Knowing 
that  g(X,n,^)  and  g(I,n,^)  are  each  piecewise  linear  and  concave,  and 


f 
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that  they  ooineide  at  ^ * 1 as  well  as  on  these  two  intervals  of  Indifference,  it 
is  sufficient  to  determine  the  solution  for  n*  2.  There  are  two  cases  for 
the  solution 

(3.3.4)  For  p(l-p)^  > q(l-q)^  ; 

optimal  choice  t I ; I : I ; I ; 

^ 0 (V  a 

p p pvj--p; 

For  p(l-p)^  < q(l-q)^  : 
optimal  choice  t I : I t I t Y t 

7 0 (?)"  I ^ 1 

How  the  method  for  obtaining  the  solution  for  n-*-l  from  that  for  n 
follows  that  given  above  with  n*l.  From  the  turning  points  of 
determine  the  turning  points  of  g(X,n,tp)  and  g(T,n,i^)  and  arrange  them 
in  order  (considering  the  necessary  cases).  Determine  tho^ie  yj  for  which 
both  and  lie  in  I-  or  I-intervals  of  the  solution  for  nj 

determine  those  for  which  both  and  lie  either  in  an  X- 

or  an  I-lnterval  of  the  solution  for  n.  The  intersection  of  these  two 
sets  will  be  the  indifference  intervals  in  the  solution  for  n*l.  From  this 
Informatisu,  the  order  of  the  turning  points  of  the  two  functions  g(X,n,^ ) 
and  g(X,n,  l!^),  and  the  concavity  of  these  two  functions,  most  of  the  solution 
for  n'*'l  can  be  inferred.  For  n=  1,  the  entire  solution  for  n~2  is  determined 
with  no  further  work,  but  for  most  of  the  cases  for  larger  values  of  n,  the 
two  functions  g(l,n,2<  ) and  g(I,n, ^)  will  have  to  be  computed  and  compared 
at  a few  isolated  points. 


t 


I 


i 
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It  is  the  eompubatlon  a&u  cooparison  of  these  functions  at  isolated 
points.  SB  well  sa  the  nultiplieity  of  eases  for  larger  n that  oakes  even 
this  method  imperfect  for  obtaining  general  solutions.  However,  for  given 
values  of  p and  q this  procedure  could  be  used  to  determine  the  optimal 
design  for  moderate  n without  undue  difficulty. 

This  section  is  concluded  by  giving  the  solution  for  n~  3 after 
first  remarking  that  the  usefulness  of  the  method  is  not  restricted  to 
problems  in  which  the  parameters  are  symmetric. 

(3.3.5)  ?or  p(l-p)^  > q(l«q)^  and  p(l-p)'^  < q(l-q)^  : 


optimal  choice  ; I : It  I t I t 

•77  0 (3.\3  a k. 


P^(i-P) 

For  p(l-p)^  > q(l-q)^  and  p(l-p)^  > q(l-q)^  s 


k£ 
q 


optimal  choice  j L 


: I 


1 


q^d-q)  a alkai 


^P^  p2^,pj  p ^l-p)  p 


u-p)' 


For  p(l-p)^  < q(l-q)^  emd  q^(l-q)^  < p^(l-p)^  : 


optimal  choice  i L 


X i 


7 


^P^  P(l^  p2(i.p)  P p(l^ 
For  p(l-p)^  < q(l-q)^  and  q^(l-q)-^  > p'^(l-p)^  ; 


optimal  choice  t I ; X ; I ; 

7 ° <-1^  <?>^ 


JL_i__L 


a a 

p p 


m 


I t 


where  A - 

(l-q)'‘(l-p-q)-p^(l-p) 


i 
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4.  Some  Hon-truneated 

In  tli«  preceding  sections  attention  has  been  focused  entirely  on 
design  problems  in  iihich  the  sample  size  was  fixed.  A problem  in  which 
experimentation  is  terminated  by  a sequential  stopping  rule  will  now  be 
considered . 

4.1.  A Mixed  Random  Walk.  Suppose,  as  in  Section  3>  that  the  two 
random  Tariables,  Z and  I,  have  binomial  distributions  with  parameters 
under  the  two  hypotheses,  and  H^,  as  given  by: 

X I 

H,  p c 


(^) 


(1-^)  H2  q 


(P>q>  p(l-p)>q(l-q)). 


Again,  y.  is  the  a priori  probability  that  is  the  true  hypothesis 
and  it  is  given  that  one  must  decide  which  of  the  hypotheses  is  the  true 
one  with  losses  as  described  in  'Uie  previous  sections. 

Let  an  observation  of  X and  an  observation  of  T have  the  same  cost. 

A design  is  now  sought  which  will  minimize  the  expected  cost  of  achieving 
u Bayes  risk  from  the  terminal  decision  of  at  most  a fixed  amount,  r.  Tnis 
is  equivalent  to  finding  that  design  which  will  miniialze  the  expected  number 
of  observations  required  to  move  the  a posteriori  probability  for  to  a 
position  either  in  the  interval  [0,rl  or  in  the  interval  Cl-r,l]. 

Let  ^ and  Xj  denote  the  a posteriori  probnbility  for  after 

having  made  the  first  J observations.  It  will  be  convenient  to  consider 
the  problem  in  terms  of  the  variable  * log  = log  . Then  let 


\i 


t 


f 


1 
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a - log  ^ 

T»  - log  ^ , and 

A ' “log  if; 

Let  n denote  the  amallest  valne  of  j for  which  either  C j ^ ^ or 
> 1-r.  It  la  seen  there  are  t*o  random  walks,  both  on  the  ^ axis 
with  boundaries  at  A and  -A,  one  of  which  Is  determined  by  the  results  of 
observations  cf  X and  the  other  determined  by  the  results  of  observations 
of  I.  After  having  made  j observations  one  finds  that  the  walk  has 
arrived  at  the  point  . Now  the  choice  must  be  made  as  to  whether 
It  is  better  that  determined  by  an  observation  of  I or 

of  7,  i.e.,  whether  the  newt  step  should  be  taken  In  the  X-walk  or  In 
the  I walk.  A rule  la  desired  prescribing  for  every  sit'ip.tion  which  walk 
should  be  taken  In  order  to  minimize  the  expected  value  of  n. 

Ir  at  the  step,  I Is  observed,  then 


(4.1.1)  - 


^j*a  with  probability  p under  and  q under  H^, 

^j-b  with  probability  1-p  ;:ndor  and  1-q  under  Hg. 


Letting  ^ denote  expoctatlon  when  Z Is  observed,  it  follows  that 

(4.1.2)  “ P q *(l-P>log  ^ “ Ii(l»2), 

V^.1*l"  “ 9 log  P Ml-q)log  ^ - -Ij(2:l), 

and  Cjlx(ls2)-(1-  Cj)lj(2:l)  - ^jJj-Ij(2:l) . 

Since  the  divergence  is  always  positive,  ^ j+l“  increasing 

function  of  and  Is  zero  for 


i 


m 

I . 


- AO 


(4.1.3)  = 1-  j;* 


, i.e.,  for 


* It(2!1) 


Slallarly,  If  I is  obserred,  then 

r 


(4.1.4)  I 


k»l 


^j-a  with  probability  q under  and  p under  Hg, 
l^^j+b  with  probaoility  1-q  under  and  1-p  under  H2. 


Alao 


(4.1.5)  *j[  - q log  a ♦(l-q)log  ^ - Ij(2:l), 

¥^J-r  - P log  I Ml-p)log  ^ - -Ij{l:2), 

“ t/jJj“lj(l*2) . 

Hence,  1®  ®1®°  ®*^  increasing  function  of  and  is  zero  for 

Il(l:2) 


(4.1.6) 


c , i.e.,  for 


To  werify  that  ^*>0,  i.e.,  that  Ij(2:l)  < Ij(l;2) , let 
?f(p)- Ij(ls2)-Ij(2;l).  Then 

0(p)  ^ (p-q)log  ^ ■'■(2-p-q)log  ^ 

x-(p)  - . 

With  p>q  and  p(l-p)  > q(l-q) , then  l-q>  p>  q and  q<l/2.  Hence,  0 is 
negatire  and  concave  for  p<  q,  zero  «t  p-  q,  convex  and  positive  for 
q<p<l/2,  and  concave  for  p>  l/2.  But  ^(q)*log-^^  > 0.  Therefore, 
0(p)  > 0 for  l-q>  p>  q. 


i j 
j < 


t 
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Therefore  the  axis  can  be  divided  into  four  parts  and  on  them  one 
will  have. 

(4.1.7)  ^ ^ 0 for 

¥G  ^ “¥  ^j+r  ^ ® J 

j+i“ ^ ¥ Ifj^r ^ 


Thus,  for  ^ j > 0 the  X-«ald  yields  an  expected  step  greater  in  magnitude 
that  the  7-«alk  and  the  expected  step  is  in  the  'right'  direction,  i.e., 
towards  the  nearest  boundary,  A.  For  0,  the  7-wald  enjoys  the  same 
advantage,  the  nearest  boundary  being  -A. 

These  considerations  JLoad  to  the  conjecture  that,  at  least  for  a small 
relative  to  A,  the  optimal  design  is  to  take  the  X-walk  on  the  J''’!  step 
when  and  the  X~walk  otherwise.  (It  should  remarked  that  if 

p(l-p)<  q(l-q),  the  same  results  will  hold  with  X and  I interchanged). 

Now  let  Xoo  denote  that  design  which  requires  that  X be  used  at 
each  step  and  Too  that  design  which  requires  that  I be  used  at  each  step. 
Denote  by  E[n|Xoo,  the  eacpocted  number  of  steps  in  the  X-walk  with 

^ as  its  starting  point  when  is  the  true  hypothesis.  Using  Wald's 
well  known  approximations. 


I 
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2kn_(k*^U  ) 


(4.1.3) 


i o jxx  ) 

— ( 

B[niioo,0\H  ] ■ ■ ■ ■■; » 

P log  I +(l-p)log  ^ 


A-S-2»f- 


-2At  -(A+eJ')t 

© ^ O 


E[nlToo,a',H^] 


2[n|T(ju,  »H2 


-2kt 


-(q  log  I ♦(l-q)log 


and 


f-kAVL  -U*-a  )n) 

‘-  «'-2*  ■■) 


-2Au 


.A. 


(p  log  J ♦(l-p)log 


kJB 


ulog‘  ulog^ 

where  u/O  satlafiee  po  ^ +(l-p)e  - 1, 

t log  ^ t log 

and  t/O  satisfiea  qe  ^ ♦(l*^-q)e  “ 1. 

It  la  eaally  Been  that  u”  -1  and  t=  1.  Then  recognizing  the 
denominatora  in  the  above  e^qnresslons  as  E-L  nunbera,  it  followa  that 

(4.1.9)  lCnlToo,2r>E[n|loo,^  ] 

- X,  •[BCn'iIoo,  J,H3^>E[nlloo,  -(l-^;  ) ^[n|Ta>,  ^,H2]-ECn|loo, 

I-(l:2)-Ij(2:l)  f -2A_  -A-^  .2A_  A+5 

’ Ij{lJ2)Ij(2:l)  j ^(A-^'2A  )+(1- ^)(A-2J-2A  ) 


Noting  that  the  firat  factor  la  poaltive,  then  by  adding  1-1  to  the 
fraction  in  the  last  term  and  rearranging,  it  la  found  that 
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(4.1.10) 


?[n|Yoo,^?]-E[n|ZcD,^  1 

-2A  -A- a*  , 

^(A-g')-(l-^)(A♦^5)-2A'i;  ♦aAd-^) 

e -1  e -1 


- -U+A(2^-1)+2A  -2LX.  ^ 

e -1 


A+2* 


-2A  , 
e -1 


2A  . , 

' J 


. 


It  can  be  easily  verified  from  (4.1.10)  that  the  difference  is  zero 
for  ^ “ 0 and  7^  “ A.  It  will  be  shown  to  be  non-negative  for  0^  ^^A. 


J 


Noting  that  Xf  “ ^ 

l+e° 

4^( ^ ),  then 


and  denoting  the  last  member  of  (4.1.10)  by 
-22(' 


(4.1.11)  ^^«(^)  - Ub°-{  * "fl V 

1 e^^-l  l*e^  e^^-l  e ^-1 


2 — /g L ♦ j2 ) _ ltS_ — (& TiSL 

y .2  ' -2A  , 2A  , /,  \4  ' -2A 

(1+e  ) e -1  e -1  (l+e"  r e 


-2A  .-A-Z?  A+y  - 

+ S — -Jl*) 
2A  ' 

1 e^  -1 


I 


Simplifying  (4.1.11)  and  removing  positive  factors  yields 

(4.1.12)  iV"(??)  (2+6^+e"^)>e^ -(2+e^+e”^)e^^+e^^  . 

At  2f“0,  the  right  side  of  (4.1.12)  is  positive.  At  ^ ■ A,  it  equals 
e"^+2+e^-2e^^,  which  is  negative  at  least  for  A > log  ^ . By  differen- 
tiation the  right  side  is  found  to  be  decreasing  in  an  interval  [0,  JJ'*) 
and  Increasing  for  j"'’.  Hence,  H^”(3)  is  first  positive  and  then 
negative  as  ^ increases  from  0 to  A (A > log  ^);  i.e.,  is  first  convex 

and  then  concave.  It  remains  only  to  show  that  '(0)>0  to  assure  that 
4^(^  )>0  for  O^^iA. 

4>»(0)  rL-  4+ e^^(A-2)+2Ae^-2Ae"^-e“^(A+2)  . 


- u - 


Denoting  the  right  member  by  S(A),  suoeeeslTe  differentiation  shows  that 
S(0)-  S'(0)-  S"(0)“  S"' (O)  while  the  fourth  derivative  at  aero  la  positive 
for  all  A^O.  Henoe  It  can  be  oonoluded  that  lj^'(0)>0  for  A>0  and  by 
the  evident  symmetry  In  the  problem  It  follows  that 

> 0 for  ^ > 0 

(4.1.13)  BCniToo,  >K[n|Xoo,?f  1 ' 


< 0 for  ^ < 0 


st 


Thus  the  design  whloh  requires  the  use  of  X at  the  j-^1  step  If 
> 0 and  I If  ^j<  0)  coincides  with  the  design  requiring  the  use  of 
the  random  variable  corresponding  to  the  smaller  of  ErnjXoo,  and 
SCnjloo,  It  also  coincides  with  the  following  design  given  in  terms 

of  the  K-L  information  numbers.  Let  ^Ij(l:2)+(1- ?^)lj.(2:l) . 

Then  ^Ij(2:l)^(l-l;)lj(l:2)  and  Jj(  ) > J^(  t;)  for  C > l/2* 


Hence,  the  design  just  described  could  also  be  expressed  by  the  rule: 
st 

at  the  j-*-!  step  xue  the  random  variable  corresponding  to  the  larger 
of  the  numbers  Jj(  ) and  J^(  ) . 

Denote  this  thrice-described  design  by  M.  While  M has  not  yet  been 
shown  to  be  the  optimuB  design,  it  can  bs  shown  to  be  better  than  either 
loo  or  Too.  This  comes  as  a special  case  of  the  next  result,  which 
concludes  this  section. 

By  a atatlonarv  design  will  be  meant  a design  in  which  the  choice  at 
the  j+1*^  step  is  a function  only  of  the  fl  nosteriori  probability  after 
tue  step-^ 

Lemma  4.1.  Let  X and  I have  densities  f^  and  g^,  respectively,  under 
hypothesis  such  that  both  log  f,/fj^  and  log  Z2f^\  assume  positive  and 
negative  values  with  positive  prolablllty.  Let  and  he  two  stationary 


t 
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.at 


CN 


designs  and  D that  design  which  requires,  at  the  j+1  step,  the  random 
vurtable  corresponding  to  the  smaller  of  E[n|D^,  and  EEnlD^, 

Then  ECniD,  ^J-]£mln^E[n|D.,»  ],  B[n  . 

Proof.  For  any  set  K In  the  Interval  [-A,A1,  let  ?^denote  Its 
complement  in  [--4,41.  Let 

(l)  =*  : for  ■ 3*  * requires  X at  the  j*l®^  step^. 


Let 


^ f (x) 

^x^  v)  “ ?;f^(X)*(l-t)f2(I)  » 

Tj(C)  ^ 

Tj(  ) - log  , where  ff  - log 


Than 


(2) 


BCnlD^,y  ] 


(3) 


H(^) 


l^Ej,CnlDj^,Tj(^)l  if  a'cT^, 

l+E^[njDj^,Ty(8' )]  if 

Let  H(S  ) - min^E^CnlDp  g*!,  B^[nlD2,  then 

( ^[nlDj^,S  ] for  -ge®, 

I E^CnlD2,2T]  for 

C l+E^[n|D^,Tj(aJ')]  for  ^ E n®  U P^  0®, 

l>E^[n(Dj^,Tj(  a*)!  for  ^ E <S>U  1^2 

Now  let  r=  n,P(n)Ur2n®.  Then 

^ C 1+Eg[n  D,Tj(a)] 

^l-E^L'n  D,Tj(2?)] 

Then  if  one  sets  G{  )"  H(  ^)-SLnih,  j, 


(4) 


2Cn/D,  a ] 


•'I  r r'' 


(5) 


) 


E^[G(Tj(^))l  7j£ 

Yc.(Tj(iS))3 


\ 
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Ko9  it  is  asserted  that  G>  0,  for  suppose  that  it  assuaes  Itt:  alnlaOD 
at  . Then 


(6) 


“(iCj  ■ < 


^ ■n  Tnitm  t sA  W'y 

( '■'''*1'  Oq//  ' 


iC  «r  n 
Oo*-  ' ’ 


But  for  an7  randon  rarlahle,  Z,  If  B[Zl"mln  Z,  then  Z*'Bln  Z «lth 
probability  1.  Therefore,  G(  - G(Tj(  for  Since 


fj(i) 


(7) 


'iU„) 


o( 


tjl) 


^o)  ' 


t) 


aJ 


Similarly,  if  P, 


g,(i). 


f,(i)  g,(i) 

Since  both  log  acd  log  ar»  negative  with  positive 


probability,  it  is  seen  that  by  a finite  nuid)er  of  applications  of  the 
above  reasoning  a point  ^ -A  can  be  reached  such  that  G(  if^)  * G(  o ' ) • 
But  G(^)“  0 for  I > A.  Hence,  u(df  )>  0.  In  view  of  the  definition 
of  G,  proof  is  complete. 

It  is  clear  that  the  same  analysis  wonid  apply  to  any  finite  number 
of  stationary  designs. 


5.  Ihfl  'iMo-arngd 

3.1.  General  Results.  The  statistical  problem  which  goes  under  this 
general  title  is  thr>t  of  finding  a design  which  will  .Bazimlze  the  sum  of  n 
independent  observations  in  the  following  situation:  let  Z end  T be  real 
Valued  random  variables  having  c.d.f. 's  and  G^,  respectively,  under 
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E - 


t 


hypotheaiB  (l”  1,2)  and  ^ be  the  a priori  probability  that  io  the 
true  hypothesis . The  problem  Is  to  devise  a sequential  design  which  will 
maximize  the  expected  value  of  the  sum  of  n observations,  each  of  which  is 
to  be  an  observation  either  of  2 or  7. 

Let  f^  and  be  the  densities  corresponding  to  and  0^^  with  respect 

to  the  measure  4'.  Let  W^(  «*)  denote  the  expected  value  of  the  sun  of 

the  n observations  if  ^ is  the  a priori  probability  for  .?nd  the  optimal 

design,  is  used.  If  one  observed  I first  and  then  continued  for  n-1 

steps  following  the  optimal  rule,  then  the  expected  sum  would  be 
00  00 


(5.1.1)  “ C J tf^(t)d'f+(i-'i;:)  J tf2(t)d^ 

-00  - 00 
00 


-CD 

Similarly,  if  I were  observed  first  and  the  optimal  rule  followed  for  the 

remaining  n-1  steps,  the  expected  sum  would  be 
00  00 

(5.1.2)  J tg^(t)d4'*(l- J;)  J tg^(t)dH^ 

- 00  - 00 


Hence,  , J* ) " max  (4^,B^). 

i natUThi  design  to  be  considered  is  that  which  requires  that  one 
maximize  step  by  step,  l.e.,  after  the  observation  the  £ rxisterlorl 
probability,  Is  computed  and  at  the  next  step  observe  the  random 

variable  oorraspondlng  to  the  maximum  of  ^ t(  jfj^(t)+(l-  l^j)f2(t))d^ 

and  t(  t;jgj^(t)+(l-  <v.  Denote  this  stepwise  maximization 

design  hy  ej  . 


i I 


3 
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Thecrem  5*1*  If  the  LJJcelihood  ratios  ®®“® 

distrlbutioDS  under  and  also  under  then  is  the  optimal  design. 


if  I is  observed  first, 
if  T is  observed  first, 


and  the  likelihood  ratios  have  the  same  distributions,  the  distribution 
of  2/^  is  independent  of  which  random  variable  is  observed  first.  Hence, 
the  expected  value  of  the  optinial  yield  from  the  last  n-1  steps  is  indepen- 
dent of  the  choice  for  the  first  step.  One  can,  therefore,  maximise  t!ae 
expected  sum  of  n observations  by  choosing  at  the  first  step  the  random 
variable  having  the  larger  expected  value  and  continuing  with  the  optimal 
design  for  the  remaining  steps. 

Since  all  the  random  variables  are  assumed  to  be  independent,  the  same 

argument  shoes  that,  given  (^,,  it  is  optimal  to  follow  ^ for  the  j-^1 

J ^ 

step. 


Proof.  Since, 


( 


■ ■ £ f,(ir 


(1) 


An  example  in  which  the  likelihood  ratios  are  distributed  alike  is; 

I I 

H(0,1)  H(p.,l) 

H2  H(/X,l)  H(0,1)  . 

If  the  above  example  la  modified  to  destroy  the  nymmetry,  e.g., 

I T 

N(0,1)  N(X,1) 

H2  k(|l,i)  h(o,i)  (jjL/s  a , U>0,  ^ >0), 


) 
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ff 


then  ie  not  optimal.  If  It  were,  then  for  n-  2 and  such  that 
{l-i^-)fjL  ■ 'C  ^ there  would  be  indifference  aa  to  the  f:Lrat  step.  If  I 
la  observed  at  the  first  step,  then  the  expected  sun  of  the  two  observations 
la 

(5.1.3)  (l-^)|l*(l-^.)-iPr(?i^;i<  (1- ^)|i|H2)*ii/iPr(  > (1-  1 


or 


00 

f J;;: 

1 J )k 

■*-  I u ft 


.it 

e ^ dt  + ^ 


fr 


•tioe 

' n 

(i-eV  2 

f 

/ 

2^ 

J 

“OD 

e ^ dt 


If,  on  the  other  band,  I is  observed  first,  the  eocpected  stua  would  be 


CD 


(5.1.4) 


.t 

2^  e 2 dt 


.t 

2-TT  • ' " 


-GO 


But  since  (l-  ?^)|i  • , if  were  optical  there  would  be  indifference 


aa  to  the  first  step  and  hence 

® _t  /V2  ^t 

• ^ - 2 


^ ® 

(5.1.5)  / • ^ J ® ^ “ J « ^ I" 

-i  -t  -* 

which  implies  that  X*  |l-. 

5.2.  The  'Two-armed  Bandit*  in  the  Binomial  Case,  k special  case  of 
the  Two-iirmed  Bandit  of  widespread  interest  is  that  in  which  the  random 
variables  have  binomial  distributions  with  parameters  given  by: 

I T 


,V2 

• ^ dt 


H. 


y 

q 


q 

p 


I 


t 

I 
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A second  example  in  which  the  likelihood  ratios  are  distributed  alike 
is  furnished  here  in  the  case  p+q- 1.  Hence,  for  that  ease,  •>4^  is  the 
optimal  design.  Indeed,  it  is  a conjecture  of  Blackwell's  that  in  any 
case  is  the  optimal  design. 

Before  considering  the  question  of  being  optimal  it  will  be  shown 
that  it  has  the  desirable  property  of  being  consistent. 

Theorem  5.2.  Following  the  design  the  ejcpected  value  of  the 

average  of  the  first  n observations  converges  to  max(p,q)  as  n — ^co. 

Proof . Assume  that  p>q-  Then, 

(1)  if  ?;>i,/2, 

V V 

(2)  if  C < 1/2, 

y V 

where  P^(Z-  c)'"  t;p(Z-  c jHj^)*(l- )P(Z  - c IH^) . 

Let  a_(  S , \J  W_(  K , . Then  a_(  C , is  monotone  increasing 

a ^ onn^'O  n ~ 

in  n and  is  bounded  from  above  by  p for  all  n.  Let  a(2J  , id  ) “ lim  a ( 

° n-^oo  ° 

Since  a.. (t,ij J is  convex  and  continuous  in  o for  O^c^lt  a(  is 

n ' o T , . 

also.  Further,  since  na^(l^,i^)  satisfies  (l)  and  (2), 


(3) 


V Q Q 


I 


J 
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Suppose  that  the  minimum  of  a(  ^ is  assumed  at  2^^>l/2.  Then 
it  also  assumes  its  minlmuia  at  p^  /IV  (X”  l)>  . By  iteration,  it 

assumes  its  minimum  at uhich  tends  to  1 as  n— ^o>.  Henoe, 

Cp  could  be  taken  tCi  be  1.  If  op  the  other  hand,  <J^^<l/2,  the  analogous 
procedure  shows  that  could  be  taken  to  be  0.  Thus , the  minimum  of 
fi(  is  assumed  either  at  0 or  1.  But  a(0,  a(l,  *J^)  ” P»  which 

establishes  the  theorem. 

If  one  lets  be  the  average  of  the  first  n observations  then  if 
is  used,  B(Z^)->>p.  Furthermore,  B(Z^)  s i*  that 

the  sequence  forms  a lower  semimartingale.  From  the  results  of 

martingale  theory  [7l  it  can  be  concluded  also  that  Z^— yp  with  probability  1. 

5.3  The  Question  of  Opifimal  Paaiow.  Compiay  was  joined  with  that 
slseable  group  who  have  jousted  with  the  problem  of  finding  the  optimal 
design  for  the  Two-armed  Bandit  problem  as  described  in  the  preceding 
section.  Efforts  were  directed  to  proving  Blackwell’s  conjecture  that 
is  optimal  and,  while  not  meeting  with  complete  success,  the  belief  remains 
that  the  conjecture  is  correct.  A brief  outline  of  two  lines  of  attack  . 

Let  X denote  that  design  calling  for  X first  followed  by  the  use 
of  for  the  remaining  steps.  Consider  fl^{  Q it  is 

equal  to  (2l^-l)(p-q)  for  m- 1.  The  induction  hypothesis  was  made  that 
the  difference  was  positive  for  ^ > l/2  for  all  m<  n.  In  computing  this 
difference  for  small  values  of  n it  appeared  that  the  case  p^(l-p) < q“(l-q) 
gave  the  smallest  difference.  For  ^ near  l/2,  but  greater  than  l/2,  and 


7 
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p*^(l-p)  < q*^(l-q),  coineldeB  with  the  rule,  y€,  which  requirea  that  when 
au  observation  is  a success  (l  is  observed)  the  ease  random  variable  is  to 

be  observed  at  the  next  step,  but  if  a failure  occurs  (O  is  observed)  then 
at  the  next  step  observe  that  random  variable  which  has  failed  the  fewest 
times  or,  if  the  number  of  failures  are  the  same,  observe  next  the  one  which 
has  succeeded  the  greatest  number  of  times;  is  case  of  ties,  observe  Z 
n«xh.  It  is  easily  *a+.i«Miehed  by  induction  that 
(5.3.1)  ,IJ^)  - (2^-l)(p“-q“)  . 

Efforts  were  directed  towards  showing  that 

V ,IV0  , 

at  least  for  ^ near  l/2  and  greater  than  I/2,  considering  the  adjustments 
which  would  have  to  be  made  in  play  according  to  T^‘  and  IfZ  to  make  them 
coincide  with  play  according  to  and  respectively.  For  adjustments 

in  required  at  points  where  called  for  I but  called  for  Z and 

o 

for  the  symmetric  points  in  adjnstlng  the  play  starting  with  Z,  it  was 
possible  to  aBtablish  that  the  adjustments  were  of  the  proper  sign.  For 
the  other  types  of  adjustments  the  attempt  to  prove  that  the  signs  were 
such  as  would  accomplish  the  proof  was  unsuccessful.  However,  it  appeared 
in  the  work  that  the  difficult  adjustments  did  not  arise  for  n^  5 aud  that 
for  h up  to  8 they  could  be  satisfactorily  accounted  for;  hence,  the 
conjecture  holds  for  n<  9. 

L second  attack  was  made  along  the  following  line.  For  ^ > l/2  but 
near  l/2,  let  k.  be  that  number  such  that  if  at  least  k^  successea  precede 
the  first  failure  then  the  random  variable  which  failed  is  observed  at  the 
next  step;  let  k^  be  that  number  such  that  if  there  have  been  at  least 


\ 


I 

I 


- 53  - 

successes  before  the  aeooad  failure  then  the  random  Tariable  which  failed 

Is  obserred  at  the  next  step:  etc.  Then  the  jontrlbution  to  W ,J.S  ) 

n o 

and  to  from  those  sequences  containing  no  failures,  one  failure, 

two  failures,  and  three  failures  was  computed  and  combined  sucoessireljr  to 
obtain  the  expression  for  their  contribution  to 

From  the  early  work  it  appeared  that  an  induatire  pattern  would  persist  in 
these  expressions,  as  sequences  containing  more  and  more  failures  were  added, 
which  would  allow  one  to  write  out  terms  of  the 

and  verify  that  it  wsa  positive.  Upon  reaching  aequenoea  containing 
three  and  four  failures  the  attempt  to  force  the  contributions  into  the 
previously  noted  patterns  has  been  unsuccessful. 
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